875 research outputs found
Transporting long-lived quantum spin coherence in a photonic crystal fiber
Confining particles in hollow-core photonic crystal fibers has opened up new
prospects to scale up the distance and time over which particles can be made to
interact with light. However, maintaining long-lived quantum spin coherence
and/or transporting it over macroscopic distances in a waveguide remain
challenging. Here, we demonstrate coherent guiding of ground-state
superpositions of 85Rb atoms over a centimeter range and hundreds of
milliseconds inside a hollow-core photonic crystal fiber. The decoherence is
mainly due to dephasing from residual differential light shift (DLS) from the
optical trap and the inhomogeneity of ambient magnetic field. Our experiment
establishes an important step towards a versatile platform that can lead to
applications in quantum information networks and matter wave circuit for
quantum sensing.Comment: Accepted by Physical Review Letter
Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders
Unsupervised representation learning of speech has been of keen interest in
recent years, which is for example evident in the wide interest of the
ZeroSpeech challenges. This work presents a new method for learning frame level
representations based on WaveNet auto-encoders. Of particular interest in the
ZeroSpeech Challenge 2019 were models with discrete latent variable such as the
Vector Quantized Variational Auto-Encoder (VQVAE). However these models
generate speech with relatively poor quality. In this work we aim to address
this with two approaches: first WaveNet is used as the decoder and to generate
waveform data directly from the latent representation; second, the low
complexity of latent representations is improved with two alternative
disentanglement learning methods, namely instance normalization and sliced
vector quantization. The method was developed and tested in the context of the
recent ZeroSpeech challenge 2020. The system output submitted to the challenge
obtained the top position for naturalness (Mean Opinion Score 4.06), top
position for intelligibility (Character Error Rate 0.15), and third position
for the quality of the representation (ABX test score 12.5). These and further
analysis in this paper illustrates that quality of the converted speech and the
acoustic units representation can be well balanced.Comment: To be presented in Interspeech 202
Simulation-optimization with machine learning for geothermal reservoir recovery: Current status and future prospects
In geothermal reservoir management, combined simulation-optimization is a practical approach to achieve the optimal well placement and operation that maximizes energy recovery and reservoir longevity. The use of machine learning models is often essential to make simulation-optimization computational feasible. Tools from machine learning can be used to construct data-driven and often physics-free approximations of the numerical model response, with computational times often several orders of magnitude smaller than those required by reservoir numerical models. In this short perspective, we explain the background and current status of machine learning based combined simulation-optimization in geothermal reservoir management, and discuss several key issues that will likely form future directions.Cited as: Rajabi, M. M., Chen, M. Simulation-optimization with machine learning for geothermal reservoir recovery: Current status and future prospects. Advances in Geo-Energy Research, 2022, 6(6): 451-453. https://doi.org/10.46690/ager.2022.06.0
Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization
Many-to-many voice conversion with non-parallel training data has seen
significant progress in recent years. StarGAN-based models have been interests
of voice conversion. However, most of the StarGAN-based methods only focused on
voice conversion experiments for the situations where the number of speakers
was small, and the amount of training data was large. In this work, we aim at
improving the data efficiency of the model and achieving a many-to-many
non-parallel StarGAN-based voice conversion for a relatively large number of
speakers with limited training samples. In order to improve data efficiency,
the proposed model uses a speaker encoder for extracting speaker embeddings and
conducts adaptive instance normalization (AdaIN) on convolutional weights.
Experiments are conducted with 109 speakers under two low-resource situations,
where the number of training samples is 20 and 5 per speaker. An objective
evaluation shows the proposed model is better than the baseline methods.
Furthermore, a subjective evaluation shows that, for both naturalness and
similarity, the proposed model outperforms the baseline method.Comment: Accepted by ICASSP202
Disentanglement Learning for Text-Free Voice Conversion
Voice conversion (VC) aims to change the perceived speaker identity of a speech signal from one to another, while preserving the linguistic content. Recent state-of-the-art VC systems typically are dependent on automatic speech recognition (ASR) models and they have gained great successes. Results of recent challenges show these VC systems have reached a level of performance close to real human voices. However, they are highly relying on the performance of the ASR models, which might experience degradations in practical applications because of the mismatch between training and test data.
VC systems independent of ASR models are typically regarded as text-free systems. They commonly apply disentanglement learning methods to remove the speaker information of a speech signal, for example, vector quantisation (VQ) or instance normalisation (IN). However, text-free VC systems have not reached the same level of performance as text-dependent systems. This thesis mainly studies disentanglement learning methods for improving the performance of text-free VC systems. Three major contributions are summarised as follows.
Firstly, in order to improve the performance of an auto-encoder based VC model, the information loss issue caused by the VQ of the model is studied. Two disentanglement learning methods are exploited to replace the VQ of the model. Experiments show that these two methods improve the naturalness and intelligibility performance of the model, but hurt the speaker similarity performance of the model. The reason for the degradation of the speaker similarity performance is studied in the further analysis experiments.
Next, the performance and the robustness of Generative Adversarial Networks (GAN) based VC models are studied. In order to improve the performance and the robustness of an GAN based VC model, a new model is proposed. This new model introduces a new speaker adaptation layer for alleviating the information loss issue caused by a speaker adaptation method based on IN. Experiments show that the proposed model outperformed the baseline models on VC performance and robustness.
The third contribution studies whether Self-Supervised Learning (SSL) based VC models can reach the same level of performance of the state-of-the-art text-dependent models. An encoder-decoder framework is established for experiments. In this framework, the performance of a VC systems implemented with a SSL model can be compared to a VC system implemented with an ASR model. Experiment results show that SSL based VC models can reach the same level of naturalness performance of the state-of-the-art text- dependent VC models. Also, SSL based VC models gained advantages on intelligibility performance when tested on out of domain target speakers. But they performed worse on speaker similarity
SALSA: Attacking Lattice Cryptography with Transformers
Currently deployed public-key cryptosystems will be vulnerable to attacks by
full-scale quantum computers. Consequently, "quantum resistant" cryptosystems
are in high demand, and lattice-based cryptosystems, based on a hard problem
known as Learning With Errors (LWE), have emerged as strong contenders for
standardization. In this work, we train transformers to perform modular
arithmetic and combine half-trained models with statistical cryptanalysis
techniques to propose SALSA: a machine learning attack on LWE-based
cryptographic schemes. SALSA can fully recover secrets for small-to-mid size
LWE instances with sparse binary secrets, and may scale to attack real-world
LWE-based cryptosystems.Comment: Extended version of work published at NeurIPS 202
SCALLOP-HD: group action from 2-dimensional isogenies
We present SCALLOP-HD, a novel group action that builds upon the recent SCALLOP group action introduced by De Feo, Fouotsa, Kutas, Leroux, Merz, Panny and Wesolowski in 2023. While our group action uses the same action of the class group on -oriented curves where for a large prime as SCALLOP, we introduce a different orientation representation: The new representation embeds an endomorphism generating in a -isogeny between abelian varieties of dimension with Kani\u27s Lemma, and this representation comes with a simple algorithm to compute the class group action. Our new approach considerably simplifies the SCALLOP framework, potentially surpassing it in efficiency — a claim to be confirmed by implementation results. Additionally, our approach streamlines parameter selection. The new representation allows us to select efficiently a class group of smooth order, enabling polynomial-time generation of the lattice of relation, hence enhancing scalability in contrast to SCALLOP.
To instantiate our SCALLOP-HD group action, we introduce a new technique to apply Kani\u27s Lemma in dimension 2 with an isogeny diamond obtained from commuting endomorphisms. This method allows one to represent arbitrary endomorphisms with isogenies in dimension 2, and may be of independent interest
- …